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Abstract. We explore the effect of discounting and experimentation in a simple 
model of interacting adaptive agents. Agents belong to either of two types and each has 
to decide whether to participate a game or not, the game being profitable when there 
is an excess of players of the other type. We find the emergence of large fluctuations 
as a result of the onset of a dynamical instability which may arise discontinuously 
(increasing the discount factor) or continuously (decreasing the experimentation rate) . 
The phase diagram is characterized in detail and noise amplification close to a 
bifurcation point is identified as the physical mechanism behind the instability. 
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1. Introduction 

Individual optimizing behavior is sometimes sufficient for a population of agents to 
coordinate on efficient outcomes. For example, competitive behavior in markets allows 
buyers and sellers to coordinate their demand and supply, and drivers trying to minimize 
transit time may avoid crowded routes, thus reducing the chances of traffic jams. Such 
consequences of individual behavior have been thoroughly studied in economics within 
an equilibrium framework. In many situations, however, such idealized conditions 
as those postulated in economics (e.g. perfect information and rationality), may 
be unrealistic [1]. When individuals are engaged in contexts involving many other 
individuals, as in financial markets, it is normally more realistic to assume that they 
acquire information from their environment and from other agents as they learn the 
relative efficiency of their strategies/actions. Experimentation - i.e. the fact of 
sometimes playing sub-optimal actions - and flexibility - the ability to change strategy in 
response to a changing environment - become important aspects of a learning dynamics 
in an evolving system. The former means that agents' behavior is best represented by a 
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stochastic choice model [2]. Flexibility implies instead that agents should discount past 
payoffs in their learning behavior, as they might not be relevant for the current state 
of their environment (at the same time, discounting too much past payoffs, may not 
allow agents to learn the full complexity of their environment [3]). It is thus essential 
to understand how the efficiency of learning is affected by individual stochastic choice 
and by payoff discounting. 

Here we address these issues in an extremely simple setting that allows us to draw sharp 
conclusions and to achieve a full understanding of the results. The model we consider is 
a stylized description of situations where two different groups of agents, taken of equal 
size for the sake of simplicity, interact, each of them providing a resource for agents of 
the other type. Examples of this generic setting include buyers and sellers - which is the 
situation we shall informally refer to henceforth - but also (heterosexual) individuals of 
opposite sex attending a party: for each of them the party might be most interesting if 
the majority of the participants are of the opposite sex. In addition to this, individuals 
may have a priori incentives to participate in the game. In our model, individuals take 
decisions based on an estimate of the payoff, which they compute from the outcomes of 
the game in the recent past. 

The model falls in the class of Minority Games (MGs) [4], which have been studied 
extensively in the recent past. Most theoretical results have been confined so far to the 
case of infinite memory, where agents do not discount payoffs. Numerical simulations for 
more complex versions of the model than that discussed here show that, interestingly, 
strong fluctuations can arise when the discount factor is introduced [3,5]. Remarkably, 
these fluctuations arise after a much longer time than any characteristic time scale in 
the system. Understanding the origin of such non-trivial fluctuations is indeed one of 
the motivations of the present work. 

Strong fluctuations and dynamical instabilities are persistent and ubiquitous in a 
multitude of social systems that can be modeled with interacting adaptive agents (like 
economies, financial markets, traffic, elections, web-communities, etc.; see e.g. [6,7]), 
hence the issues raised here have a rather general relevance. Dynamic instabilities in 
models of interacting agents have been also addressed in the economics literature (see 
Ref. [8] for a review), but the role of stochastic fluctuations has only been recently 
recognized [9]. 

The aim of the present paper is to present a model which is simple enough to allows 
us to understand the origin of strong fluctuations in a system of boundedly rational 
interacting agents. In particular we will focus on the interplay between discounting and 
the stochastic nature of the learning process. 

We find that, in a region of parameter space, increasing the discount factor, the system 
crosses a discontinuous transition and enters a region where two distinct dynamical 
steady state solutions exist. The emergence of strong fluctuations is then an activated 
process and, as such, these materialize (and dematerialize) after times which can be 
extremely long. This same transition arises continuously, decreasing the randomness 
(or experimentation) in the choice behavior, at a critical threshold. At a still higher 



Dynamical instabilities in a simple minority game with discounting 



3 



threshold a stable solution reappears, coexisting with the strongly fluctuating one. 
2. The model 

We consider a system of iV buyers and N sellers. We denote by B and S the sets 
of buyers and sellers, respectively. At time step t, agent i can either play the game 
ijiiit) = 1) or abstain (rii(t) = 0). We assume that he chooses to play with probability 

, r / x , 1 + tanh[m ± mi 
Prob{n,(t) = 1} = 2 L tKJi (1) 

where T > is the (uniform) learning rate of agents, while the functions U^(t), 
representing the cumulated score of buyers and sellers, respectively, evolve according 
to 

Utit + l) = {I - \)Utit) T A{t) - e (2) 

Here e is a cost (or incentive if e < 0) which agents incur for participation. This 
represents outside opportunities, such as investing in a risk free asset instead of buying. 
A(t) stands for the participation imbalance (or 'excess demand'), defined as 

1 



A ® N 



(3) 



_ieB ies 

Finally the constant A (0 < A < 1) plays the role of a discount factor, introducing a 
finite memory on the score. In words, agents estimate a score for participating in the 
game taking an exponential moving average of a payoff. The payoff =F^4(t) — e depends 
on how many more agents of the other type had recently participated (i.e. on A(t)) and 
on a constant cost e. Agents base their decision on whether to play or not by learning 
how profitable was it to play in the past. Note that the reinforcement term in the 
learning dynamics is independent of the agent, hence we can drop the index i from Q. 
For the analysis which follows, it is reasonable to separate the sum and the difference 
of U by introducing the variables 

m = r^W±^ (4) 
m - r^l^W (5) 

so that U ± = (u ± y)/T. The dynamics of u is deterministic and it can be easily solved 
to yield 



u(t) 



fi(0) + y 



;i - a)* - y (6) 



so that u converges over times of the order of 1/A to its asymptotic value — re/A. We 
shall henceforth neglect such a transient, and set u = — re/A. The time evolution of y 
is instead described by 

y(t + l)=y(t)(l-X)-TA(t). (7) 

In order to understand the structure of fluctuations, we shall now focus on the above 
process. 
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3. Deterministic approximation 

For N — > oo a deterministic theory holds. Indeed the quantity A(t) satisfies the law of 
large numbers, i.e. it converges almost surely to 
, . , . tanhfw + y) — tanh(w — y) 

(A\y) = ^ !U- ^ y -l (8) 

(with (. . . \y) we denote expected values with respect to the distribution of ni(t), at fixed 
U^). This in turn implies that ^ is well approximated by 

y(t + l) = y(t)(l-X)-T(A\y(t)}. (9) 

so that, if y(t) = y, A(t) has variance 

, 2 . 2-tanh 2 (w + y) -tanh 2 (w-y) 

( 6A \y) = ^ ( 10 ) 

cosh~ 2 (u + y) + cos1i _2 (m — y) 
= AN ^ ' 

which vanishes as iV — > oo. 

Equation ^ is a nonlinear map with multiplicative noise. It possesses a trivial fixed 
point y* = 0, which is stable if |1 — A — T[l — tanh(-u) 2 ]| < 1 i.e. when 

T < (2 - A)cosh 2 (r e /A). (12) 

For e = 0, one recovers the stability condition of the original MG, A + T < 2 with 2N 
players [10]. The role of e is then to stabilize the system, as it decreases the right-hand 



side of (12), independently of its sign. Note that (12) can be re-written as 



A / r 

e > -acoshW — - (13) 

Also note that increasing A drives the system closer to the instability. 
Figure [l] reports the stability condition of the y* = fixed point in the (A, T) plane 
(full lines). A notable feature of these curves is the existence of two transitions as T is 
varied while A and e are kept constant. This is confirmed by the numerical simulations 
shown in Fig. [2j where the fluctuations (y 2 ) of y around its fixed point are plotted as a 
function of F at fixed e and A, for different values of N. 

In the region where y* = is unstable, a different solution must be considered. 
Indeed, besides the trivial fixed point, there is also a period two solution of the form 
y{t) = (— l)*z*, where z* solves the equation 

~ 1 (A\z)= F x ^ {2Z \ f ^ (14) 



2 — A 2 - A cosh(2w) + cosh(2z) 

This admits again a trivial solution z* = 0, which is stable in the same region where 
the y* = solution is. In addition, it can also have a solution z* ^ which is stable in 
the region where the y* = solution in unstable. The stability of this solution may be 
studied precisely in the same way as before, and it reveals that stability extends beyond 
the region where the y* = solution is unstable, as shown in Fig. [1} In particular, it 
extends to the region of large T for A > 2e/(l + e), where the solution takes the simple 
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Figure 1. Stability in the parameter space. The trivial fixed point is stable on the 
left of the respective e lines (continuous curves) and below the 2T + A = 2 line (dashed 
line). The stability lines of the oscillating solution (dashed curves) are shown only for 
e = 0.05 and e = 0.1 for sakes of clarity. 



10 | 1 1 1 I I I I I | 1 1 1 I I I I I | 1 1 T| 




Figure 2. Fluctuations oiy(t) for e = 0.05, A = 0.35 as a function of T. In the unstable 
region (intermediate T), fluctuations are dominated by the periodic oscillation of y(t), 
giving rise to a (y 2 ) that can be computed analytically (dashed line). 
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asymptotic form z ~ 1/(2 — A) (T ^> 1). This region is limited by a dashed curve in Fig. 
[TJ which indicates that the transition form the z* ^ solution to the y* = solution 
is discontinuous in the region where the latter is stable, whereas it is continuous in the 
region where the fixed point y* = is unstable. In the region between the dashed and 
the solid line both solutions coexist. 

So when A is large enough and T is increased from a small value, we expect a continuous 
transition, whereas when F is large enough and A increases from zero across the 
transition, we find a sharp discontinuous transition and phase coexistence. The latter 
matches closely the numerical finding of [5] which reported the onset of instabilities 
occurring for very large times in a more complex version of the present model. 
It is worth to comment on the effect of e on the dynamics. In the fixed point y* = 0, e 
controls the average fraction of agents, thorugh u: 

1 + tanh(u) 1 + tanh(— Te/A) 

= 2 = 2 

This decreases with e and it vanishes if e ^> A/T is large and positive. On the other 
hand (n^) — > 1 if e is large and negative, i.e. all agents play if the incentive is large 
enough. This is the main effect of the sign of e. Indeed both the stability condition as 
well as the value of z and the fluctuation properties (see below) are independent of the 
sign of e. 



4. Fluctuations in the stochastic system 

In order to explain this rich behavior, it is necessary to study the fluctuations in the 
stochastic system. 

Let us first notice that (y 2 ) ~ z 2 on the periodic solution z* ^ 0. On this solution, the 
fluctuations in A are of order one. Indeed they are given by 

(A 2 ) = (A) 2 + (5 A 2 ) * 2 z 2 + O(AT-i) 

where we have neglected fluctuations (5 A 2 ) ~ 0(A^ _1 ). In brief, the fluctuations are 
dominated by the periodic oscillation and the noise has a negligible effect. 
Things are a bit more complicated for fluctuations around the y* = fixed point. The 
starting point of our analysis is ([7]). We take the square of each side and average over the 
realizations of the stochastic process. In the stationary state, after rearranging terms, 
this yields 

[1 - (1 - A) 2 ] (y 2 ) + 2r(l - A) (yA) - T 2 (A 2 ) = (15) 
Now, the last two averages take into account both the effect of random choice, for a 



fixed value of y, as described by (|8j) and (11), and the effect induced by the fluctuations 
of y itself. For the first term, we may write 

d(A\y) 



(yA) = / dyP(y)y (A\y) 



dy 



(y 2 ) + o((y*)) 
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where in the second equality we have expanded the conditional average around the fixed 
point y — 0. This was to be expected: because the stochastic noise is of order l/N, 
also fluctuations 5y ~ 1/ VN are likely to be small. Therefore for large N we can safely 
neglect higher order terms in the above equation. Likewise, we can approximate 

(16) 
(17) 



(A 2 ) = j dyP(y)[(6A 2 \y) + (A\y) 2 } 



Here, since (5A 2 \0) ~ l/N, we have neglected contributions due to fluctuations in y, as 
they only matter to higher orders in l/N. With these, (15) can easily be solved to yield 
the fluctuations of y: 

T 2 



2N coslr(w) 



1 - (1 - A-r/cosh 2 (M))' 



(18) 



Notice that fluctuations diverge as the instability line is approached, as expected. (18) 
agrees perfectly with numerical simulations, as shown in Fig. [2| 

The approach discussed here can easily be extended, along the lines of Ref. [11], in order 
to describe the onset of anomalous fluctuations beyond the gaussian regime, close to the 
instability line. 



5. Conclusion 



We have shown that discounting can have non-trivial consequences on the dynamics 
of a system of interacting adaptive agents. From a naive point of view, this sounds 
counterintuitive as discounting decreases the correlation time, hence the strength of 
non-linear effects. The interplay with the stochastic nature of the learning process (due 
to experimentation) is crucial. Indeed the instability does not arise (for A < 1) neither 
if the noise is too strong (small T) or if the dynamics is close to being deterministic (r 
large) . 

We argue that the dynamic instability discussed here is of the same nature as that 
which is responsible for the onset of large fluctuations in the Minority Game discussed 
in Ref. [5] . Indeed, the mechanism discussed here is likely of a very general nature and a 
similar phase structure may arise in other games with discounting and experimentation. 
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